Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures

نویسندگان

  • Vipin Sachdeva
  • Douglas M. Freimuth
  • Chris Mueller
چکیده

The Jaccard/Tanimoto coefficient is an important workload, used in a large variety of problems including drug design fingerprinting, clustering analysis, similarity web searching and image segmentation. This paper evaluates the Jaccard coefficient on the the Cell/B.E.processor and the Intel R ©Xeon R ©dual-core platform. In our work, we have developed a novel parallel algorithm specially suited for the Cell/B.E. architecture for all-to-all Jaccard comparisons, that minimizes DMA transfers and reuses data in the local store. We show that our implementation on Cell/B.E. outperforms the implementations on comparable Intel platforms by 6-20X with full accuracy, and from 10-50X in reduced accuracy mode, depending on the size of the data. In addition to performance, we also discuss in detail our efforts to optimize our workload on both the Cell/B.E. and the Intel architectures and explain how avenues for optimization on each architecture are very different and vary from one architecture to another for our workload. Our work shows that the algorithms or kernels employed for the Jaccard coefficient calculation are heavily dependent on the traits of the target hardware.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Correcting Jaccard and other similarity indices for chance agreement in cluster analysis

Correcting a similarity index for chance agreement requires computing its expectation under fixed marginal totals of a matching counts matrix. For some indices, such as Jaccard, Rogers and Tanimoto, Sokal and Sneath, and Gower and Legendre the expectations cannot be easily found. We show how such similarity indices can be expressed as functions of other indices and expectations found by approxi...

متن کامل

Fully Convolutional Architectures for Multi-Class Segmentation in Chest Radiographs

The success of deep convolutional neural networks on image classification and recognition tasks has led to new applications in very diversified contexts, including the field of medical imaging. In this paper we investigate and propose neural network architectures within the context of automated segmentation of anatomical organs in chest radiographs, namely for lungs, clavicles and heart. The pr...

متن کامل

Tanimoto's Best Barbecue: Discovering Regulatory Modules using Tanimoto Scores

We present a combinatorial method for discovering cis-regulatory modules in promoter sequences. Our approach combines “sliding window” approaches with a scoring function based on the so-called Tanimoto score. This allows to identify sets of binding sites that tend to occur preferentially in the vicinity of each other in a given set of promoter sequences belonging to co-expressed or orthologous ...

متن کامل

Comparison of similarity coefficients used for cluster analysis with dominant markers in maize (Zea mays L)

The objective of this study was to evaluate whether different similarity coefficients used with dominant markers can influence the results of cluster analysis, using eighteen inbred lines of maize from two different populations, BR-105 and BR-106. These were analyzed by AFLP and RAPD markers and eight similarity coefficients were calculated: Jaccard, Sorensen-Dice, Anderberg, Ochiai, Simple-mat...

متن کامل

Implicitly Defined Substructure Fingerprints for Support Vector Machines

For the calculation of the Tanimoto similarity of two molecules, only the patterns that occur in at least one of them are needed. These can be obtained on-the-fly by a generation method. : The substructure set is generated for each of the molecules, and each of the substructures is checked, if it is also contained in the other set. For the Tanimoto Coefficient it is sufficient to know the cardi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009